The expansion of Internet of Things (IoT) ecosystems and cyber–physical systems has shifted artificial intelligence processing from centralized cloud infrastructures to resource-constrained edge devices. Although edge computing enables lower latency, reduced network congestion, and enhanced data privacy, it also presents significant obstacles for deep learning training due to limited computation power, energy constraints, and hardware heterogeneity. To address these challenges, this paper introduces a Hybrid Parallel Distributed Deep Learning Framework tailored for heterogeneous edge environments. The proposed approach integrates data parallelism and model parallelism to efficiently utilize diverse edge resources. Workload distribution is adaptively managed based on device capabilities, network variability, and energy availability. Experimental evaluations using image classification tasks show that the proposed framework achieves superior training efficiency, improved energy utilization, and enhanced model performance when compared with centralized cloud training and single-parallel edge learning methods.
Introduction
Edge computing is increasingly adopted for latency-sensitive applications such as autonomous driving, video surveillance, healthcare monitoring, and industrial control. While deep learning is essential for these applications, training deep neural networks directly on edge devices is challenging due to limited computation, memory, energy resources, heterogeneous hardware, and constrained network bandwidth. Traditional cloud-centric or single parallelization strategies are often inefficient in such environments.
To address these challenges, the study proposes a hybrid parallel distributed deep learning framework that combines data parallelism and model parallelism for heterogeneous edge computing systems. The framework uses a centralized edge coordinator with a hybrid scheduler that dynamically assigns training tasks based on device capabilities, network conditions, and energy availability. High-capacity devices handle computation-intensive model layers, while resource-constrained devices participate through data parallelism. Adaptive gradient aggregation and optimized communication further reduce overhead.
Experiments were conducted on a heterogeneous set of edge devices using the CIFAR-10 dataset and two models—ResNet-18 and MobileNetV2. The hybrid parallel approach outperformed centralized and data-parallel training in terms of accuracy, training time, and energy efficiency. It achieved higher model accuracy, faster training per epoch, and lower energy consumption, demonstrating better convergence and scalability.
Conclusion
This study introduced a hybrid parallel distributed deep learning framework designed for heterogeneous edge computing environments. By jointly leveraging data parallelism and model parallelism, the proposed framework effectively mitigates resource limitations and device heterogeneity commonly encountered in edge systems. Experimental results demonstrate that the approach delivers improved training efficiency, reduced energy consumption, and competitive model accuracy when compared with conventional centralized and single-parallel training methods.
Future research will focus on extending the framework to support federated learning paradigms in order to further enhance data privacy and scalability. Additionally, the integration of transformer-based architectures will be explored to accommodate emerging deep learning applications with higher computational demands. Finally, real-world deployment and evaluation in smart city scenarios, such as intelligent traffic management and urban surveillance, will be pursued to validate the practicality and robustness of the proposed framework.
References
[1] H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), 2017, pp. 1273–1282.
[2] J. Dean, G. Corrado, R. Monga, et al., “Large scale distributed deep networks,” in Advances in Neural Information Processing Systems (NeurIPS), 2012, pp. 1223–1231.
[3] A. Harlap, H. Cui, W. Dai, et al., “PipeDream: Fast and efficient pipeline parallel DNN training,” in Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP), 2019, pp. 1–15.
[4] T. Zhang, Y. Wang, L. Liu, and M. Chen, “Collaborative edge learning for heterogeneous edge computing systems,” IEEE Internet of Things Journal, vol. 10, no. 6, pp. 5124–5136, 2023.
[5] Y. Liu, X. Chen, J. Li, and S. Wang, “Resource-aware distributed deep learning for edge intelligence,” IEEE Transactions on Network and Service Management, vol. 21, no. 2, pp. 1345–1358, 2024.
[6] S. Teerapittayanon, B. McDanel, and H.-T. Kung, “Distributed deep neural networks over the cloud, the edge and end devices,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37, no. 11, pp. 3280–3293, 2018.
[7] M. Satyanarayanan, “The emergence of edge computing,” Computer, vol. 50, no. 1, pp. 30–39, 2017.
[8] Q. Xia, W. Liang, Z. Xu, and S. Ren, “A survey of edge intelligence: Architecture, enabling technologies, and applications,” IEEE Communications Surveys & Tutorials, vol. 27, no. 1, pp. 1–36, 2025.
[9] L. Wang, M. Chen, Y. Li, and V. Leung, “Edge AI: On-device intelligence for smart IoT applications,” ACM Computing Surveys, vol. 57, no. 3, pp. 1–38, 2025.
[10] A. Howard, M. Sandler, G. Chu, et al., “MobileNetV2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 4510–4520.
[11] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
[12] A. Krizhevsky, “Learning multiple layers of features from tiny images,” Technical Report, University of Toronto, 2009.